Methods And Apparatus For Adaptive Gain Control In A Communication System

ABSTRACT

Methods and apparatus for a communication system having microphones and loudspeakers to determine a noise and speech level estimate for a transformed signal, determine a SNR from the noise and speech level estimates, and determine a gain for the transformed signal to achieve a selected SNR range at a given position. In one embodiment, the gain is determined by adapting an actual gain to follow a target gain, wherein the target gain is adjusted to achieve the selected SNR range.

BACKGROUND

As is known in the art, certain communication systems, such as in-carcommunication (ICC) systems, enable passengers in a car to communicatewith each other in a more comfortable manner. Conventional systems caninclude microphones and loudspeakers to receive and transmit soundwithin the car. For example an ICC system can enable a driver to focuson the road while talking to passengers in the rear of the car withoutturning towards a passenger. Due to the driver's forward-facing headorientation, the driver may have to speak loudly in order to compensatefor acoustic loss. For the reverse problem of having a conversation fromrear to front, the problem is less dominant but still present.

SUMMARY

In exemplary embodiments of the invention, a communication systemincludes a series of microphones for picking up the speech of a speakerby performing speech enhancement processing. The system can play backthe enhanced speech signal via loudspeakers dedicated to the listener.The system provides a relatively constant audio impression for a varietyof noise and driving conditions.

In general, a communication system in accordance with exemplaryembodiments of the invention optimizes gain control to achieve animproved audio impression. A target function for gain control can takepsychoacoustic effects into consideration, provide improved gainadaptation, and enable a relatively compact implementation. With thisarrangement, a system is provided with considerably reduced tuningeffort and more natural audio impression as compared with conventionalsystems.

In exemplary embodiments, a communication system fuses AGC (automaticgain control), which compensates for speaker-dependent speech volumes,and NDGC (noise dependent gain control), which provides voiceamplification in adverse environments. In some embodiments, losscontrol, which is used to switch between front and rear passengers, canalso be fused into a unitary gain control module that uses a targetfunction for gain control.

In one embodiment, a communication system provides a relatively constantaudio impression for listeners given by a predefined Signal-To-Noise(SNR) target range at the listener's position. When AGC and NDGC arecombined with loss control, a flexible gain provides fast switching incross-talk scenarios and smooth transitions when the background noisechanges. A more natural audio impression can be achieved by usingpsychoacoustic measures to determine the actual SNR and the target SNR.

In one aspect of the invention, a method comprises: for a communicationsystem having microphones and loudspeakers, transforming a signalreceived by a first one of the microphones to the frequency domain;determining a noise level estimate for the transformed signal;determining a speech level estimate for the transformed signal;determining a SNR from the noise and speech level estimates; anddetermining a gain for the transformed signal to achieve a selected SNRrange at a given position, wherein determining the gain comprises:adapting an actual gain to follow a target gain, wherein the target gainis adjusted to achieve the selected SNR range; comparing the target gainand the actual gain to determine a gain change increment; increasing theactual gain if the SNR at the given position is lower than a minimum SNRin the SNR range; and decreasing the actual gain if the SNR at the givenposition is higher than a maximum SNR in the SNR range.

The method can further include one or more of the following features:adapting the SNR range in response to a change in the noise level, usinga psychoacoustic measure for determining the noise and/or speech levelestimate, determining a noise level estimate for the transformed signalusing spectral weights based upon the psychoacoustic measure, performingvoice activity detection on the transformed signal and generatingweighting factors from the voice activity detection to detect an activespeaker, determining mixer weights to select a speaker-specific gain forthe active speaker, attenuating the gain during speech pauses detectedby the voice activity detection, determining the gain for differentfrequencies, in a bi-directional system, setting a maximum gain to blocka loudspeaker output of an inactive communication system, using a firstone of the microphones to determine the SNR at the given position,and/or the given position corresponds to a position proximate anexpected location of a user ear.

In another aspect of the invention, an article comprises: anon-transitory computer-readable medium having stored instructions thatenable a machine to: for a communication system having microphones andloudspeakers, transform a signal received by a first one of themicrophones to the frequency domain; determine a noise level estimatefor the transformed signal; determine a speech level estimate for thetransformed signal; determine a SNR from the noise and speech levelestimates; and determine a gain for the transformed signal to achieve aselected SNR range at a given position, wherein determining the gaincomprises: adapt an actual gain to follow a target gain, wherein thetarget gain is adjusted to achieve the selected SNR range; compare thetarget gain and the actual gain to determine a gain change increment;increase the actual gain if the SNR at the given position is lower thana minimum SNR in the SNR range; and decrease the actual gain if the SNRat the given position is higher than a maximum SNR in the SNR range.

The article can further include one or more of the following features:instructions to adapt the SNR range in response to a change in the noiselevel, instructions to use a psychoacoustic measure for determining thenoise and/or speech level estimate, instructions to perform voiceactivity detection on the transformed signal and generate weightingfactors from the voice activity detection to detect an active speaker,instructions to determine mixer weights to select a speaker-specificgain for the active speaker, and/or, in a bi-directional system,instructions to set a maximum gain to block a loudspeaker output of aninactive communication system.

In a further aspect of the invention, a communication system comprises:microphones to receive sound in an enclosed environment; loudspeakers togenerate sound into the enclosed environment; a sound processing moduleto transform a signal received by a first one of the microphones to thefrequency domain; a noise estimate module to determine a noise levelestimate for the transformed signal; a speech estimate module todetermine a speech level estimate for the transformed signal; a gaincontrol module comprising: a SNR module to determine a SNR from thenoise and speech level estimates; and a gain module to determine a gainfor the transformed signal to achieve a selected SNR range at a givenposition, the gain module including a processor configured to: adapt anactual gain to follow a target gain, wherein the target gain is adjustedto achieve the selected SNR range; compare the target gain and theactual gain to determine a gain change increment; increase the actualgain if the SNR at the given position is lower than a minimum SNR in theSNR range; and/or decrease the actual gain if the SNR at the givenposition is higher than a maximum SNR in the SNR range.

The system can further be configured to include one or more of thefollowing features: the gain module is further configured to adapt theSNR range in response to a change in the noise level, use apsychoacoustic measure for determining the noise and/or speech levelestimate, determine a noise level estimate for the transformed signalusing spectral weights based upon the psychoacoustic measure, performvoice activity detection on the transformed signal and generateweighting factors from the voice activity detection to detect an activespeaker, determine mixer weights to select a speaker-specific gain forthe active speaker, attenuate the gain during speech pauses detected bythe voice activity detection, determine the gain for differentfrequencies, in a bi-directional system, set a maximum gain to block aloudspeaker output of an inactive communication system, use a first oneof the microphones to determine the SNR at the given position, and/orthe given position corresponds to a position proximate an expectedlocation of a user ear.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing features of this invention, as well as the inventionitself, may be more fully understood from the following description ofthe drawings in which:

FIG. 1 is a schematic representation of a communication system havinggain control in accordance with exemplary embodiments of the invention;

FIG. 2A is a schematic representation of a unidirectional in carcommunication system having gain control;

FIG. 2B is a schematic representation of a bidirectional in carcommunication system having gain control;

FIG. 3 is a schematic representation of a communication system havingmicrophones and loudspeakers and gain control;

FIG. 4 is a schematic representation of a gain control module that canform a part of the system of FIG. 3;

FIG. 5 is a flow diagram of an exemplary sequence of steps forimplementing gain control; and

FIG. 6 is a schematic representation of a speech signal enhancementsystem having exemplary robust speaker activity detection that can beused in the gain control system of FIG. 3;

FIG. 7 is a schematic representation of an exemplary microphoneselection system having robust speaker activity detection; and

FIG. 8 is a schematic representation of an exemplary computer that canperform at least a portion of the processing described herein.

DETAILED DESCRIPTION

FIG. 1 shows an exemplary communication system 100 including a speechsignal enhancement system 102 having a gain control module 104 inaccordance with exemplary embodiments of the invention. A microphonearray 106 includes one or more microphones 106 a-N to receive soundinformation, such as speech from a human speaker. It is understood thatany practical number of microphones 106 can be used to form a microphonearray. Respective pre-processing modules 108 a-N can process informationfrom the microphones 106 a-N. Exemplary pre-processing modules 108 caninclude echo cancellation. Additional signal processing modules caninclude beamforming 110, noise suppression 112, wind noise suppression114, transient removal 116, speaker/voice activity detection 118, etc.

While exemplary embodiments of the invention are shown and described inconjunction with an in car communication (ICC) system, it is understoodthat embodiments of the invention are applicable to communicationsystems in general in which SNR-based gain control is desirable withinan enclosed space.

FIG. 2A shows a unidirectional in car communication (ICC) system 150. Avehicle 152 includes a series of loudspeakers 154 and microphones 156within the passenger compartment. In one embodiment, the passengercompartment includes a microphone 156 for each passenger. In anotherembodiment (not shown), each passenger has a microphone array.

In general, the system 150 should achieve a relatively constant audioimpression for a variety of noise and driving conditions. One or moremicrophones 156 are directed to the driver and the front passenger.Speech enhancement is applied to the microphone 156 signals, asdescribed more fully below.

FIG. 2B shows a vehicle 152′ having first and second ICC systems 150 a,bto provide bidirectional operation. As described more fully below, aloss control module 158 can be provided to apply appropriate attenuationfor the inactive ICC system.

Bidirectional communication is provided in which speech signalsoriginating from rear speakers are picked up by additional microphones157 in the rear of the car. The speech signal is enhanced and amplifiedbefore it is played back over front loudspeakers 155. Front loudspeakers155 and rear microphones 157 are added over the unidirectionalembodiment of FIG. 2A

Both front-to-rear and rear-to-front communication are typicallycalculated more or less independently from each other by the respectiveICC systems 150 a, 150 b. In exemplary embodiments, the first and secondICC systems 150 a,b are used in parallel. However, in exemplaryembodiments, only one direction is active in order to avoidinstabilities caused by feedback from loudspeakers of one direction intothe microphones of the opposite direction.

In general, bidirectional communication is more convenient for rear andfront passengers. Using more microphones allows retrieval of moreprecise information about the acoustic environment within the car.Having only one ICC system 150 a,b active at any given time avoids thespeech of rear passengers being picked up by front microphones andplayed back again to the rear passengers.

The ICC system 150 compensates for acoustic loss between passengers in acar or other enclosed space. The communication system 150 can use anumber of microphones 156 for picking up the speech of a speaker,performing speech enhancement, and playing back the enhanced speechsignal via loudspeakers 154 dedicated to the listener. Voiceamplification can compensate for the acoustic loss from the speaker'smouth to the listener's ear.

As described more fully below, signal gain is controlled depending onthe speaker's speech volume and the surrounding background noise level.Appropriate equalizing ensures stability of the system and soundquality. The enhanced speech signal is played back over the rearloudspeakers 154 of the car. Equalization and gain control preserve thenatural localization of the speaker. The listener should still have theimpression that the speech signal is coming from the speaker, e.g.driver or front passenger, and not from the loudspeaker. However, thespeaker does not want to be aware of the speaker's voice played backover the loudspeakers.

FIG. 3 shows an exemplary communication system 300 having gain control302 in accordance with exemplary embodiments of the invention. Signalsfrom microphones 304 a,b are first high-pass filtered to suppresslow-frequency background noise (not shown). The filtered microphone 304signal is then transformed by a filter bank into frequency domain, suchas by FFT processing modules 306 a,b. In parallel, the microphone noiselevels are tracked in respective noise estimation modules 308 a,b.

Noise suppression modules 310 a,b can calculate frequency domainspectral weighting coefficients to moderate sibilant sounds (de-essing)and to suppress background noise, for example, in the microphonesignals. As is known in the art, De-essing refers to any technique forreducing excessive prominence of sibilant consonants, such as “s”, “z”and “sh” in recordings of the human voice. Excessive sibilance can becaused by compression, microphone choice and technique, and the like.Sibilance typically lies in frequencies between 2-10 kHz.

Speech estimation modules 312 a,b receive the noise suppressedmicrophone signals and output respective speech estimate signals thatare provided to the gain control module 302 and respective voiceactivity detection modules 314 a,b. Generation of speech estimatesignals can be performed using any suitable technique known to one ofordinary skill in the art.

A mixer 316 detects which speaker, e.g. driver or front-passenger, istalking. Feedback suppression detects residual feedback artifacts (fromloudspeaker to microphone) and applies temporal attenuation if required.

As described more fully below, the gain control module 302 providesAutomatic Gain Control (AGC) and Noise Dependent Gain Control (NDGC) tocontrol the overall amplification of the speaker's speech signal. Aftertransformation back to the time domain, such as by an IFFT module 318,the enhanced signal is then split into a number of output channels eachof which is coupled to a respective equalizer 320. Each signal playedback over a loudspeaker or loudspeaker pair 322 dedicated to thecorresponding output channel.

Since Voice Activity Detection (VAD) 314 is used prior to gain control302, the system can handle speech utterances and speech pausesappropriately, e.g., by reducing the gain during long speech pauses.Noise estimation 308 and speech level estimation 312 are calculated infrequency domain to allow spectral weighting. In one embodiment, noiseestimation 308 includes the use of spectral weights according to theITU-R 468 standard, which is based on psychoacoustic experiments underthe objective to obtain aurally compensated noise estimates.

In one embodiment, speech level estimation 312 uses so-calledA-/B-/C-weighting in order to approximate the well-known equal loudnesscontours in a manner known in the art. The aggressiveness of noisesuppression may be controlled by the noise estimate 308 includingspectral weighting to achieve a good compromise between speech qualityand masking background noise. It is understood that any suitable speechand noise level estimation techniques can be used. In one embodiment,speech and noise level are jointly estimated in a voice activitydetection module.

After gain control 302, the enhanced signal is transferred back intotime domain 318, equalized 320, and then played back over theloudspeaker(s) 322.

Exemplary embodiments of the invention provide AGC, NDGC, and losscontrol. In exemplary embodiments of the invention, automatic gaincontrol attenuates the speech signal of loud speakers. As is known inthe art, speakers typically speak louder in adverse environments, e.g.,high noise situations, known as the Lombard effect. In general, the AGCmodule tracks speech energy and determines whether the speech volume isstill within an acceptable range. If the speech volume exceeds a certainlimit, the ICC gain is reduced relatively rapidly. Otherwise,attenuation is slowly reduced.

Noise Dependent Gain Control (NDGC) provides a constant audio impressionfor listeners. Since the background noise level in automotiveenvironments is highly time variant, the ICC system adapts voiceamplification continuously. Beyond a given noise threshold, the ICC gainis continuously increased according to configurable NDGCcharacteristics. NDGC is expected to react relatively slowly to moderatechanges in background noise.

Loss control is used in a bidirectional ICC system having two or moreindependent ICC systems. Loss control avoids closed-loop amplificationof both ICC systems by applying appropriate attenuation for the inactiveICC system. Loss control ensures fast switching in case of speakerchanges between speakers in the front and the rear of a car. Losscontrol ensures that only one ICC system is active at a given time.

Exemplary embodiments of the invention fuse AGC, NDGC and loss controlin a gain control module that uses a target function to provide arelatively constant audio impression for listeners given by a selectedSNR target range at the listener's position. In one embodiment, anatural audio impression is achieved by using psychoacoustic measures todetermine the actual SNR and the target SNR. The fact that a listenerhears both the loudspeaker output signal and the direct sound from thespeaker can be taken into consideration since the superposition of bothsignals affects some other psychoacoustic effects such as acousticlocalization.

FIG. 4 shows an exemplary gain control module 400, which can be providedas the gain control module 302 of FIG. 3. As noted above, speakerstypically differ in speech volume, which complicates designing an ICCsystem that supports all speakers in the same manner. With respect tosystem stability, speakers who speak loudly can be more problematic. Onthe other hand, one can expect that loud speakers are more intelligiblefor other passengers since they achieve better Signal-to-Noise Ratios(SNR). In this case less amplification of the system is required.

Noise estimate modules 308 a,b (FIG. 3) provide respective noiseestimate signals p_(N) ^(Mic0), p_(N) ^(Mic1) for the signals from thefirst and second microphones 304 a,b. Speech estimate modules 312 a,bprovide speech estimate signals p_(s) ^(Mic0), p_(s) ^(Mic1) for thesignals from the first and second microphones 304 a,b. The weightingfactors ω_(Mixer) and the input signal sig_(In) are provided by thesignal mixer 316. The voice activity detection (VAD) modules 314 a,bprovide respective signals for the first and second microphones b_(VAD)^(Mic0), b_(VAD) ^(Mic1) to indicate which speaker, e.g., driver orfront passenger, is active. The gain control module 400 provides anoutput signal sig_(out) that can be equalized across the variouschannels, as described above.

In an exemplary embodiment, the speech level estimate signals p_(s)^(Mic0), p_(s) ^(Mic1) are buffered 402 a,b for each speaker separately.In general, noise and speech estimates can be adapted independentlyduring speech pauses/speech utterances. Based on noise and speech levelestimates p_(N) ^(Mic0), p_(N) ^(Mic1), p_(s) ^(Mic0), p_(s) ^(Mic1),the gain 404 a,b is calculated to achieve a predetermined SNR range 406a,b at the listener's ear. In addition to a target SNR range, the VADtags b_(VAD) ^(Mic0), b_(VAD) ^(Mic1) can be used to attenuate the ICCgain during long speech pauses to avoid high amplification of residualbackground noise or non-stationary noise. The mixer weights ω_(Mixer)determine which speaker-specific gain should be used which can be usefulin cross-talk situations.

In exemplary embodiments of the invention, instead of calculating AGCbased on speech level and NDGC based on noise level independently, atarget function achieves an approximately constant SNR range withrespect to the listener's ear position. To avoid mismatches betweenperceived and measured SNRs, the use of psychoacoustic measures may behelpful, as noted above.

As long as the ICC gain is relatively small compared to the inverse ofthe coupling factor representing the feedback from loudspeaker tomicrophone, a proportional relation is expected between the increase ingain and the resulting SNR. When ICC gain approaches the maximum gain,it is expected that the resulting SNR reacts disproportionately high onincreasing the ICC gain. For example, the coupling factors can be usedfor speaker-to-microphone, speaker-to-listener andloudspeaker-to-listener and the corresponding delays, in order toestimate the SNR at the listener's ear position based on the SNRmeasured at the microphone. Furthermore, a model of the audiolocalization can be created according to the Haas effect. Both helpachieve a constant audio impression for the listener and to improve theaudio localization for speaker and listener.

In one embodiment, this is achieved by an appropriate upper limit of theICC gain. In a bidirectional ICC system the microphones dedicated to thelistener can be used in order to measure the SNR directly.

A bidirectional system, such as the system shown in FIG. 2B, can use themicrophone configuration to estimate the ratio of direct sound andloudspeaker output signal. A loss control module can evaluate the powerratio of the microphone signal and the direct sound recorded by thelistener's microphone in case of single-talk scenarios. In oneembodiment, the system measures this ratio a priori and allows someadaptation in a predetermined range. For example, this can be used todetect whether the driver turns his head while speaking. This wouldincrease the direct sound and lower the signal power measured at themicrophone.

In an exemplary embodiment, a target gain and an actual gain are used.Actual gain refers to the gain which is directly applied to the speechsignal. The actual gain is continuously adapted to follow the targetgain. In one embodiment, the target gain is calculated as follows:

G _(target)(Ω)=min{G(Ω),G _(max)(Ω)}  (1)

where G denotes the gain necessary to achieve the target SNR. Forexample: G[dB]=SNR_target[dB]−SNR_measured[dB] so that the SNR estimatesinclude knowledge about speech level and background noise, as well asthe psychoacoustic weighting. In equation (1) above, G is limited toobtain the target gain which should be achieved. Equation (2) belowdefines the adaptation scheme if conditions (3) and (4) are fulfilled.

The gain can be calculated for each frequency separately or it can beaveraged over all frequencies or a given frequency band. The maximumgain Gmax is the limit where the feedback from loudspeaker to microphoneseverely affects system stability and sound quality. In addition, thelimit Gmax can be controlled adaptively. For example, if instabilitiesare detected, one could reduce the gain only for a few frequenciesinstead of reducing the gain for all frequencies. Spectral weightingfactors of other modules such as noise suppression can also be takeninto account.

The actual gain is adapted step-by-step until the target gain G_(target)is reached in order to achieve smooth gain transitions. Target gain andactual gain are compared to obtain the gain increment/decrement definedbelow:

G _(Δ)[dB]=G _(target)[dB]−G _(actual)[dB])·Δ_(inc/dec)  (2)

The gain increment is applied to the actual gain. By using an adaptivegain, the ICC gain will react slightly on small log-ratios but willadapt rapidly on large values. The first effect stabilizes the audioimpression in relatively time-invariant acoustic environments. Thesecond effect minimizes the delay of gain control in case of rapidchanges.In one embodiment, the gain increases only if the SNR at the listener'sear position is lower than the predefined SNR range:

SNR<SNR _(min)

G _(Δ)[dB]>0  (3)

On the other hand, the gain should be decreased only if the SNR is toohigh:

SNR>SNR _(max)

G _(Δ)[dB]<0  (4)

This arrangement combines NDGC and AGC in one gain control module. Inaddition, a SNR range [SNRmin; SNRmax] is useful to preserve natural SNRfluctuations during speech utterances. In one embodiment, the SNR rangeis adapted when the noise level increases to reflect the Lombard effect.

FIG. 5 shows an exemplary sequence of steps for implementing gaincontrol. In step 500, target and actual gain are compared. From thecomparison, a gain change increment is determined in step 502. In step504, it is determined whether the SNR is less than a minimum SNR of theSNR range. If so, in step 506 the gain is incremented by the computedgain change increment. If not, in step 508 it is determined whether theSNR is greater than a maximum SNR of the SNR range. If so, in step 510the gain is decreased by the computed gain change increment. After anygain increase/decrease, optional loss control is performed in step 512.

A bidirectional system, such as that shown in FIG. 2B, requires gaincontrol for the multiple ICC systems 150 a,b. In one embodiment, anappropriate upper limit Gmax is set in accordance with Equation (1) toblock the loudspeaker output of the inactive ICC system. A largedifference of actual and target gain results in rapid adaptation andsmall delay.

It is understood that any suitable speaker activity detection (SAD) canbe used for the VAD module 314 a,b of FIG. 3. It is further understoodthat SAD and VAD are herein used interchangeably. An exemplary SADsystem is shown and described in T. Matheja, M. Buck, and T.Fingscheidt, entitled “Speaker Activity Detection for DistributedMicrophone Systems in Cars,” Proc. of the 6th Biennial Workshop onDigital Signal Processing for In-Vehicle Systems, September, 2013, whichis incorporated herein by reference.

An exemplary SAD system is shown in FIG. 6, which shows an exemplaryspeech signal enhancement system 600 having a speaker activity detection(SAD) module 602 and an event detection module 604 coupled to a robustspeaker detection module 606 that provides information to a speechenhancement module 608. In one embodiment, the event detection module604 includes at least one of a local noise detection module 650, a windnoise detection module 652, a diffuse sound detection module 654, and adouble-talk detection module 656.

The basic speaker activity detection (SAD) module 602 output is combinedwith outputs from one or more of the event detection modules 650, 652,654, 656 to avoid a possible positive SAD result during interferingsound events. A robust SAD result can be used for further speechenhancement 608.

It is understood that the term robust SAD refers to a preliminary SADevaluated against at least one event type so that the event does notresult in a false SAD indication, wherein the event types include one ormore of local noise, wind noise, diffuse sound, and/or double-talk.

In one embodiment, the local noise detection module 650 detects localdistortions by evaluation of the spectral flatness of the differencebetween signal powers across the microphones, such as based on thesignal power ratio. The spectral flatness measure in channel m for{tilde over (K)} subbands, can be provided as:

 m , K ~ SF  (  ) = exp  { 1 K ~ . ∑ k = 0 K ~ - 1  log ( max  { m (  ,  k ) , ε } ) } 1 K ~ . ∑ k = 0 K ~ - 1  max  { m  (  ,  k) , ε } ( 5 )

Energy-based speaker activity detection (SAD) evaluates a signal powerratio (SPR) in each of M≧2 microphone channels. In embodiments, theprocessing is performed in the discrete Fourier transform domain withthe frame index e and the frequency subband index k at a sampling rateof ƒ_(s)=16 kHz, for example.

Temporal smoothing of the spectral flatness with γSF can be providedduring speaker activity (SÃD_(m)(l)>0) and decreasing with γ_(dec) ^(SF)when there is not speaker activity as set forth below:

 _ m , K ~ SF  (  ) = { γ SF ·  _ m , K ~ SF  (  - 1 ) + ( 1 - γSF ) ·  m , K ~ SF  (  ) , if   m  (  ) > 0 , γ dec SF ·  _ m ,K ~ SF  (  - 1 ) , else . ( 6 )

In one embodiment, the smoothed spectral flatness can be thresholded todetermine whether local noise is detected. Local Noise Detection (LND)in channel m with {tilde over (K)}: whole frequency range and thresholdΘ_(LND) can be expressed as follows:

$\begin{matrix}{{{LND}_{m}()} = \left\{ \begin{matrix}{1,} & {{{{if}\mspace{14mu} {{\overset{\_}{}}_{m,\overset{\sim}{K}}^{SF}()}} > \Theta_{LND}},} \\{0,} & {{else}.}\end{matrix} \right.} & (7)\end{matrix}$

In one embodiment, the wind noise detection module 650 thresholds thesmoothed spectral flatness using a selected maximum frequency for wind.Wind noise detection (WND) in channel m with {tilde over (K)} being thenumber of subbands up to, e.g., 2000 Hz and the threshold Θ_(WND) can beexpressed as:

$\begin{matrix}{{{WND}_{m}()} = \left\{ \begin{matrix}{1,} & {{{{if}\mspace{14mu} \left( \; {{{\overset{\_}{}}_{m,\overset{\sim}{K}}^{SF}()} > \Theta_{WND}} \right)}\left( {{{LND}_{m}()} < 1} \right)},} \\{0,} & {{else}.}\end{matrix} \right.} & (8)\end{matrix}$

It is understood that the maximum frequency, number of subbands,smoothing parameters, etc., can be varied to meet the needs of aparticular application. It is further understood that other suitablewind detection techniques known to one of ordinary skill in the art canbe used to detect wind noise.

In an exemplary embodiment, the diffuse sound detection module 354indicates regions where diffuse sound sources may be active that mightharm the speaker activity detection. Diffuse sounds are detected if thepower across the microphones is similar. The diffuse sound detectionmodule is based on the speaker activity detection measure χ_(m)^(SAD)(l). To detect diffuse events a certain positive threshold has tobe exceeded by this measure in all of the available channels, whereasχ_(m) ^(SAD)(l) has to be always lower than a second higher threshold.

In one embodiment, the double-talk module 356 estimates the maximumspeaker activity detection measure based on the speaker activitydetection measure χ_(m) ^(SAD)(l) set forth above, with an increasingconstant γ_(inc) ^(χ) applied during fullband speaker activity if thecurrent maximum is smaller than the currently observed SAD measure. Thedecreasing constant γ_(dec) ^(χ) is applied otherwise, as set forthbelow.

 ^ max , m SAD  (  ) = {  ^ max , m SAD  (  - 1 ) + γ inc  , if  (   ^ max , m SAD  (  - 1 ) <  ^ m SAD  (  ) )  ( m  (  ) >0 ) , max  {  ^ max , m SAD  (  - 1 ) - γ dec  , - 1 } , else . ( 9)

Temporal smoothing of the speaker activity measure maximum can beprovided with γ_(SAD) as follows:

χ _(max,m) ^(SAD)(l)=γ_(SAD)·χ _(max,m)^(SAD)(l−1)+(1−γ_(SAD))·{circumflex over (χ)}_(max,m) ^(SAD)(l).  (10)

Double talk detection (DTD) is indicated if more than one channel showsa smoothed maximum measure of speaker activity larger than a thresholdΘ_(DTD), as follows:

$\begin{matrix}{{{DTD}()} = \left\{ \begin{matrix}{1,} & {{{if}\mspace{14mu} \left( {\left( {\sum\limits_{m = 1}^{M}{f\left( {{{\overset{\_}{}}_{\max,m}^{SAD}()},\Theta_{DTD}} \right)}} \right) > 1} \right)},} \\{0,} & {{else}.}\end{matrix} \right.} & (11)\end{matrix}$

Here the function ƒ(x, Y) performs threshold decision:

$\begin{matrix}{{f\left( {x,y} \right)} = \left\{ \begin{matrix}{1,} & {{{{if}\mspace{14mu} x} > y},} \\{0,} & {{else}.}\end{matrix} \right.} & (12)\end{matrix}$

With the constant γDTDε{0, . . . , 1} we get a measure for detection ofdouble-talk regions modified by an evaluation of whether double-talk hasbeen detected for one frame:

$\begin{matrix}{{{\overset{\_}{}}^{DTD}()} = \left\{ \begin{matrix}{\begin{matrix}{{\gamma_{DTD} \cdot {{\overset{\_}{}}^{DTD}\left( { - 1} \right)}} +} \\\left( {1 - \gamma_{DTD}} \right)\end{matrix},} & {{{{if}\mspace{14mu} {{DTD}()}} > 0},} \\{{\gamma_{DTD} \cdot {{\overset{\_}{}}^{DTD}\left( { - 1} \right)}},} & {{else}.}\end{matrix} \right.} & (13)\end{matrix}$

The detection of double-talk regions is followed by comparison with athreshold:

 (  ) = { 1 , if    _ DTD  (  ) > 0 , else . ( 14 )

FIG. 7 shows an exemplary microphone selection system 700 to select amicrophone channel using information from a SNR module 702, an eventdetection module 704, which can be similar to the event detection module604 of FIG. 6, and a robust SAD module 706, which can be similar to therobust SAD module 606 of FIG. 6, all of which are coupled to a channelselection module 708. A first microphone select/signal mixer 710, whichreceives input from M driver microphones, for example, is coupled to thechannel selection module 708. Similarly, a second microphoneselect/signal mixer 712, which receives input from M passengermicrophones, for example, is coupled to the channel selection module708. As described more fully below, the channel selection module 708selects the microphone channel prior to any signal enhancementprocessing. Alternatively, an intelligent signal mixer combines theinput channels to an enhanced output signal. By selecting the microphonechannel prior to signal enhancement, significant processing resourcesare saved in comparison with signal processing of all the microphonechannels.

When a speaker is active, the SNR calculation module 702 can estimateSNRs for related microphones. The channel selection module 708 receivesinformation from the event detection module 704, the robust SAD module706 and the SNR module 702. If the event of local disturbances isdetected locally on a single microphone, that microphone should beexcluded from the selection. If there is no local distortion, the signalwith the best SNR should be selected. In general, for this decision, thespeaker should have been active.

In one embodiment, the two selected signals, one driver microphone andone passenger microphone can be passed to a further signal processingmodule (not shown), that can include noise suppression, for example.Since not all channels need to be processed by the signal enhancementmodule, the amount of processing resources required is significantlyreduced.

In one embodiment adapted for a convertible car with two passengers within-car communication system, speech communication between driver andpassenger is supported by picking up the speaker's voice overmicrophones on the seat belt or other structure, and playing thespeaker's voice back over loudspeakers close to the other passenger. Ifa microphone is hidden or distorted, another microphone on the belt canbe selected. For each of the driver and passenger, only the ‘best’microphone will be further processed.

Alternative embodiments can use a variety of ways to detect events andspeaker activity in environments having multiple microphones perspeaker. In one embodiment, signal powers/spectra Φ_(SS) can be comparedpairwise, e.g., symmetric microphone arrangements for two speakers in acar with three microphones on each seat belts, for example. The topmicrophone m for the driver Dr can be compared to the top microphone ofthe passenger Pa, and similarly for the middle microphones and the lowermicrophones, as set forth below:

Φ_(SS,Dr,m)(l,k)

Φ_(SS,Pa,m)(l,k)  (15)

Events, such as wind noise or body noise, can be detected for each groupof speaker-dedicated microphones individually. The speaker activitydetection, however, uses both groups of microphones, excludingmicrophones that are distorted.

In one embodiment, a signal power ratio (SPR) for the microphones isused:

$\begin{matrix}{{{SPR}_{m}\left( {,k} \right)} = \frac{\Phi_{{SS},m}\left( {,k} \right)}{\Phi_{{SS},m^{\prime}}\left( {,k} \right)}} & (16)\end{matrix}$

Equivalently, comparisons using a coupling factor K that maps the powerof one microphone to the expected power of another microphone can beused, as set forth below:

Φ_(SS,m)(l,k)·K _(m,m′)(l,k)

H Φ _(SS,m′)(l,k)  (17)

The expected power can be used to detect wind noise, such as if theactual power exceeds the expected power considerably. For speechactivity of the passengers, specific coupling factors can be observedand evaluated, such as the coupling factors K above. The power ratios ofdifferent microphones are coupled in case of a speaker, where thiscoupling is not given in case of local distortions, e.g. wind or scratchnoise.

FIG. 8 shows an exemplary computer 800 that can perform at least part ofthe processing described herein. The computer 800 includes a processor802, a volatile memory 804, a non-volatile memory 806 (e.g., hard disk),an output device 807 and a graphical user interface (GUI) 808 (e.g., amouse, a keyboard, a display, for example). The non-volatile memory 806stores computer instructions 812, an operating system 816 and data 818.In one example, the computer instructions 812 are executed by theprocessor 802 out of volatile memory 804. In one embodiment, an article820 comprises non-transitory computer-readable instructions.

Processing may be implemented in hardware, software, or a combination ofthe two. Processing may be implemented in computer programs executed onprogrammable computers/machines that each includes a processor, astorage medium or other article of manufacture that is readable by theprocessor (including volatile and non-volatile memory and/or storageelements), at least one input device, and one or more output devices.Program code may be applied to data entered using an input device toperform processing and to generate output information.

The system can perform processing, at least in part, via a computerprogram product, (e.g., in a machine-readable storage device), forexecution by, or to control the operation of, data processing apparatus(e.g., a programmable processor, a computer, or multiple computers).Each such program may be implemented in a high level procedural orobject-oriented programming language to communicate with a computersystem. However, the programs may be implemented in assembly or machinelanguage. The language may be a compiled or an interpreted language andit may be deployed in any form, including as a stand-alone program or asa module, component, subroutine, or other unit suitable for use in acomputing environment. A computer program may be deployed to be executedon one computer or on multiple computers at one site or distributedacross multiple sites and interconnected by a communication network. Acomputer program may be stored on a storage medium or device (e.g.,CD-ROM, hard disk, or magnetic diskette) that is readable by a generalor special purpose programmable computer for configuring and operatingthe computer when the storage medium or device is read by the computer.Processing may also be implemented as a machine-readable storage medium,configured with a computer program, where upon execution, instructionsin the computer program cause the computer to operate.

Processing may be performed by one or more programmable processorsexecuting one or more computer programs to perform the functions of thesystem. All or part of the system may be implemented as, special purposelogic circuitry (e.g., an FPGA (field programmable gate array) and/or anASIC (application-specific integrated circuit)).

Having described exemplary embodiments of the invention, it will nowbecome apparent to one of ordinary skill in the art that otherembodiments incorporating their concepts may also be used. Theembodiments contained herein should not be limited to disclosedembodiments but rather should be limited only by the spirit and scope ofthe appended claims. All publications and references cited herein areexpressly incorporated herein by reference in their entirety.

1. A method, comprising: for a communication system having microphones and loudspeakers, transforming a signal received by a first one of the microphones to the frequency domain; determining a noise level estimate for the transformed signal; determining a speech level estimate for the transformed signal; determining a SNR from the noise and speech level estimates; and determining a gain for the transformed signal to achieve a selected SNR range at a given position, wherein determining the gain comprises: adapting an actual gain to follow a target gain, wherein the target gain is adjusted to achieve the selected SNR range; comparing the target gain and the actual gain to determine a gain change increment; increasing the actual gain if the SNR at the given position is lower than a minimum SNR in the SNR range; and decreasing the actual gain if the SNR at the given position is higher than a maximum SNR in the SNR range.
 2. The method according to claim 1, further including adapting the SNR range in response to a change in the noise level.
 3. The method according to claim 1, further including using a psychoacoustic measure for determining the noise and/or speech level estimate.
 4. The method according to claim 3, further including determining a noise level estimate for the transformed signal using spectral weights based upon the psychoacoustic measure.
 5. The method according to claim 1, further including performing voice activity detection on the transformed signal and generating weighting factors from the voice activity detection to detect an active speaker.
 6. The method according to claim 5, further including determining mixer weights to select a speaker-specific gain for the active speaker.
 7. The method according to claim 5, further including attenuating the gain during speech pauses detected by the voice activity detection.
 8. The method according to claim 1, further including determining the gain for different frequencies.
 9. The method according to claim 1, further including, in a bi-directional system, setting a maximum gain to block a loudspeaker output of an inactive communication system.
 10. The method according to claim 9, further including using a first one of the microphones to determine the SNR at the given position.
 11. The method according to claim 1, wherein the given position corresponds to a position proximate an expected location of a user ear.
 12. An article, comprising: a non-transitory computer-readable medium having stored instructions than enable a machine to: for a communication system having microphones and loudspeakers, transform a signal received by a first one of the microphones to the frequency domain; determine a noise level estimate for the transformed signal; determine a speech level estimate for the transformed signal; determine a SNR from the noise and speech level estimates; and determine a gain for the transformed signal to achieve a selected SNR range at a given position, wherein determining the gain comprises: adapt an actual gain to follow a target gain, wherein the target gain is adjusted to achieve the selected SNR range; compare the target gain and the actual gain to determine a gain change increment; increase the actual gain if the SNR at the given position is lower than a minimum SNR in the SNR range; and decrease the actual gain if the SNR at the given position is higher than a maximum SNR in the SNR range.
 13. The article according to claim 12, further including instructions to adapt the SNR range in response to a change in the noise level.
 14. The article according to claim 12, further including instructions to use a psychoacoustic measure for determining the noise and/or speech level estimate.
 15. The article according to claim 12, further including instructions to perform voice activity detection on the transformed signal and generate weighting factors from the voice activity detection to detect an active speaker.
 16. The article according to claim 15, further including instructions to determine mixer weights to select a speaker-specific gain for the active speaker.
 17. The article according to claim 12, further including, in a bi-directional system, instructions to set a maximum gain to block a loudspeaker output of an inactive communication system.
 18. A communication system, comprising: microphones to receive sound in an enclosed environment; loudspeakers to generate sound into the enclosed environment; a sound processing module to transform a signal received by a first one of the microphones to the frequency domain; a noise estimate module to determine a noise level estimate for the transformed signal; a speech estimate module to determine a speech level estimate for the transformed signal; a gain control module comprising: a SNR module to determine a SNR from the noise and speech level estimates; and a gain module to determine a gain for the transformed signal to achieve a selected SNR range at a given position, the gain module including a processor configured to: adapt an actual gain to follow a target gain, wherein the target gain is adjusted to achieve the selected SNR range; compare the target gain and the actual gain to determine a gain change increment; increase the actual gain if the SNR at the given position is lower than a minimum SNR in the SNR range; and decrease the actual gain if the SNR at the given position is higher than a maximum SNR in the SNR range.
 19. The system according to claim 18, wherein the gain module is further configured to adapt the SNR range in response to a change in the noise level.
 20. The system according to claim 18, wherein the noise estimate module is configured to use a psychoacoustic measure for determining the noise level estimate. 